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Abstract 

This paper considers the problem of knowledge in¬ 
ference on large-scale imperfect repositories with 
incomplete coverage by means of embedding enti¬ 
ties and relations at the first attempt. We propose 
IIKE (Imperfect and Incomplete Knowledge Em¬ 
bedding), a probabilistic model which measures the 
probability of each belief, i.e. {h,r,t), in large- 
scale knowledge bases such as NELL and Eree- 
base, and our objective is to learn a better low¬ 
dimensional vector representation for each entity {h 
and t) and relation (r) in the process of minimiz¬ 
ing the loss of fitting the corresponding confidence 
given by machine learning (NELL) or crowdsour- 
ing (Ereebase), so that we can use | |h + r — t| | to 
assess the plausibility of a belief when conducting 
inference. We use subsets of those inexact knowl¬ 
edge bases to train our model and test the perfor¬ 
mances of link prediction and triplet classification 
on ground truth beliefs, respectively. The results of 
extensive experiments show that IIKE achieves sig¬ 
nificant improvement compared with the baseline 
and state-of-the-art approaches. 


1 Introduction 


The explosive growth in the number of web pages has 
drawn much atte ntion to the study of information extraction 
I Sarawagi, 20081 in recent decades. The aim of this is to dis¬ 
till unstructured online texts, so that we can store and exploit 
the distilled information as structured knowledge. Thanks to 
the long-term efforts made by experts, crowdsouring and even 
machine learning techniques, several web-scale knowledge 
repositories have been built, such as Wordnej^ Ereebas^and 
NELL|^ and most of them contain tens of millions of ex¬ 
tracted beliefs which are commonly represented by triplets, 
i.e. {head-entity, relation, tail-entity). 

Although we have gathered colossal quantities of be¬ 


liefs, state of the art work in the literature [ West et al, 
20141 reported that in this field, our knowledge bases 


’http://wordnet.princeton.edu/ 
^https://www.freebase.com/ 
’http://rtw.ml.emu.edu/rtw/ 


are far from complete. Eor instance, nearly 97% per¬ 
sons in Ereebase have unknown parents. To populate in¬ 
complete knowledge repositories, a large proportion of re 
searchers follow the classical approach by extracting know! 


edge from texts I Zhou et al, 2005 Bach and Badaskar, 2007 
Mintz et al, 200^ Eor example, they explore ideal ap¬ 


proaches that can automatically generate a precise belief 
like {Madrid, capital-city-of, Spain) from the sentence 
“Madrid is the capital and largest city of S'pa/w.’p] on the 
web. However, even cutting-edge research lEan et al, 20141 
could not satisfy the demand of web-scale deployment, due 
to the diversification of natural language expression. More¬ 
over, many implicit relations between two entities which are 
not recorded by web texts still need to be mined. 

Therefore, some recent studies focus on inferring undis¬ 
covered beliefs based on the knowledge base itself without 
using extra web texts. One representative idea is to con¬ 
sider the whole repository as a graph where entities are nodes 
and relations are edges. The canonical approaches ]Qui 


Ian and Cameron-Jones, 199^|Lao and Cohen, 2010[[Lao et 


al, 201 if Gardner et al, 2013) generally conduct relation 

specific random walk inference based on the local connec¬ 
tivity patterns learnt from the imperfect knowledge graph. 
An alternative paradigm aims to perform open-relation in¬ 
ference via embedding all the elements, including entities 
and relations, i nto low-dimensional vector spaces. The pro¬ 
posed methods IjSutskever et al, 200^ [Jenatton et al, 2012 


Bordes et al, 26ll[|Bordes et al, 2013[|Socher et al, 2013 

Wang et al, 2014[ 'show promising performance, however, by 

means of learning from ground-truth training knowledge. 

This paper thus contributes a probabilistic knowledge em¬ 
bedding model called //KZ0 to measure the probability of 
each triplet, i.e. {h,r,i), and our objective is to learn a 
better low-dimensional vector representation for each entity 
(h and t) and relation (r) in the process of minimizing the 
loss of fitting the corresponding confidence given by machine 
learning (NELL) or crowdsouring (Ereebase). To the best 
of our knowledge, IIKE is the first approach that attempts 
to learn global connectivity patterns for open-relation infer¬ 
ence on imperfect and incomplete knowledge bases. In order 
to prove the effectiveness of the model, we conduct exper- 


^http://en.wikipedia.org/wiki/Madrid 

^short for Imperfect and Incomplete Knowledge Embedding. 






























iments on two tasks involved in knowledge inference, link 
prediction and triplet classification, using the two reposito¬ 
ries mentioned above. Inexact beliefs are used to train our 
model, and we test the performance on ground truth beliefs. 
Results show that IIKE outperforms the other cutting-edge 
approaches on both different types of knowledge bases. 


2 Related Work 

We group recent research work related to self-inferring new 
beliefs based on knowledge repositories without e xtra texts 


into two categories, graph-based inference models I 

Quinlan 

and Cameron-Jones, 1993||Lao and Cohen, 20101 |Lao et al. 

2011 

Gardner et al, 2013) and embedding-based inference 

models | Sutskever et al, 2009 Jenatton et al, 2012 

Bordes 

\et al., 20111 IBordes et al., 20131 iSocher et al., 2013), and 


describe the principal differences between them. 


• Symbolic representation v.s. Distributed representa¬ 
tion: Graph-based models regard the entities and rela¬ 
tions as atomic elements, and represent them in a sym¬ 
bolic framework. In contrast, embedding-based models 
explore distributed representations via learning a low¬ 
dimensional continuous vector representation for each 
entity and relation. 

• Relation-specific us. Open-relation: Graph-based mod¬ 
els aim to induce rules or paths for a specific relation 
first, and then infer corresponding new beliefs. On the 
other hand, embedding-based models encode all rela¬ 
tions into the same embedding space and conduct infer¬ 
ence without any restriction on some specific relation. 


2.1 Graph-based Inference 

Graph-based inference models generally learn the represen¬ 
tation for specific relations from the knowledge graph. 

N-FOIL I Quinlan and Cameron-Jones, 1993| learns first 
order Horn clause rules to infer new beliefs from the known 
ones. So far, it has helped to learn approximately 600 such 
rules. However, its ability to perform inference over large- 
scale knowledge repositories is currently still very limited. 
PRA [Lao and Cohen, 2010l Lao et al, 201 Ij Gardner et 


al., 2013| is a data-driven random walk model which follows 
the paths from the head entity to the tail entity on the local 
graph structure to generate non-linear feature combinations 
representing the labeled relation, and uses logistic regression 
to select the significant features which contribute to classify¬ 
ing other entity pairs belonging to the given relation. 


2.2 Embedding-based Inference 

Embedding-based inference models usually design various 
scoring functions fr{h,t) to measure the plausibility of a 
triplet {h,r,t). The lower the dissimilarity of the scoring 
function fr{h, t) is, the higher the compatibility of the triplet 
will be. 

Unstructured | |Bordes et al., 2013) is a naive model which 
exploits the occurrence information of the head and the tail 
entities without considering the relation between them. It de¬ 
fines a scoring function ||h — t||, and this model obviously 


can not discriminate a pair of entities involving different rela¬ 
tions. Therefore, Unstructured is commonly regarded as the 
baseline approach. 

Distance Model (SE) iBordes et al., 20111 uses a pair of 
matrices (Wr/i,Wrt), to characterize a relation r. The dis¬ 
similarity of a triplet is calcu lated b y ||Wr/ih — Wrtt||i- As 
pointed out by Socher et al. 1 2013) , the separating matrices 
Wrh and Wrt weaken the capability of capturing correlations 
between entities and corresponding relations, even though the 
model takes the relations into consideration. 

Single Layer Model, proposed by Socher et al. 1 2013) thus 
aims to alleviate the shortcomings of the Distance Model by 
means of the nonlinearity of a single layer neural network 
p(Wr;ih-|-lTrtt + br), in which g = tank. The linear output 
layer then gives the scoring function; g(Wrhi^ + Wrtf + 

br). _ _ 

Bilinear Model ISutskever et al, 2009| [Jenatton et al. 


12012) is another model that tries to fix the issue of weak in¬ 
teraction between the head and tail entities caused by D/i- 
fance Moiiel with a relation-specific bilinear form: fr{h,t) = 
h'^Wrt. 

Neural Tensor Network (NTN) I Socher et al., 20T3) de¬ 
signs a general scoring function: fr{h, t) = 

Wr/ib-|-Wrtt-l-br), which combines the Single Layer Model 
and the Bilinear Model. This model is more expressive as the 
second-order correlations are also considered into the nonlin¬ 
ear transformation function, but the computational complex¬ 
ity is rathe r high. _ 

TransE | |Bordes et al., 2013) is a canonical model differ¬ 
ent from all the other prior arts, which embeds relations into 
the same vector space of entities by regarding the relation 
r as a translation from h to t, i.e. b -f r = t. It works 
well on the beliefs with ONE-TO-ONE mapping property 
but performs badly on multi-mapping beliefs. Given a se¬ 
ries of facts associated with a ONE-TO-MANY relation r, 
e.g. {h,r,ti),{h,r,t2), TraniE tends to repre¬ 

sent the embeddings of entities on MANY-side extremely the 
same with each other and hard ly to be discriminated. 

TransH | Wang et al., 20141 is the state of the art approach 
as far as we know. It improves TransE by modeling a relation 
as a hyperplane, which makes it more flexible with regard to 
modeling beliefs with multi-mapping properties. 

Even though the prior arts of knowledge embedding are 
promising when conducting open-relation inference on large- 
scale bases, the stage they stand on is made of ground-truth 
beliefs. The model IIKE that we have proposed belongs to the 
embedding-based community, but firstly tackles the problem 
with knowledge inference based on imperfect and incomplete 
repositories. Nevertheless, we compare our approach with the 
methods mentioned above, and assess the performance with 
both the dataset and the metrics they have used as part of the 
extensive experiments. 


3 Model 

The plausibility of a belief {h,r,t) can be regarded as 
the joint probability of the head entity h, the relation 
r and the tail entity t, namely Pr{h,r,t). Similarly, 
Pr{h\r,t) stands for the conditional probability of predict- 


















































U.S.A 


Spain France; 


Washington, D. C. 


Madrid 


Paris 


Word Embedding Space 


U.S.A 


Spain France’ 


^ capital_city_of 

\ \ 

' Washington, D.C. 


Madrid 


Paris 


Knowledge Embedding Space 


Figure 1: The result of vector calculation in the word embedding space: v Madrid — "v Spain + "v France ~ Paris and 

^Madrid Spain ^U.S.A ~ ^Washington.D .C. The mOSt pOSSible reaSOn of ^ Spain ^Madrid ~ ''^France '^Paris 

and Vspain - '^Madrid « ’^u.S.A “ '^Washington.D.c., IS that capitaLcity .0 f is the shared relation. In other words, 
ii^Madrid + FcapitaPcity^of ~ tspain, if the belief {Madrid, capitaLcity^of, Spain) is plausible. 


ing h given r and t. We assume that Pr{h,r,t) is collabo- 
ratively influenced by Pr{h\r,t), Pr{r\h,t) and Pr{t\h,r), 
and more specifically it equals to the geometric mean of 
Pr{h\r,t)Pr{r\h,t)Pr{t\h,r), which is shown in the sub¬ 
sequent equation, 


Pr{h,r,t) = ^Pr{h\r, t)Pr{r\h, t)Pr{t\h, r). (1) 

Given r and t, there are multiple choices of h' which may 
appear as the head entity. Therefore, if we use Eh to de¬ 
note the set of all the possible head entities given r and t, 
Pr{h\r, t) can be defined as 


Pr{h\r, t) 


exp 


D{h,r,t) 


T^h'^Es. ’ 


( 2 ) 


The other factors, i.e. Pr{r\h, t) and Pr{r\h, t), are defined 
accordingly by slightly revising the normalization terms as 
shown in Equation (3) and (4), in which R and Et represents 
the set of relations and tail entities, respectively. 


Pr{r\h, t) 
Pr{t\h, r) 


^^^D{h,r,t) 

(3) 



(4) 


The last function that we do not explain in Equation (2), (3) 
and (4) is D{h,r,t). Inspired by s omewhat surprising pat- 
terns learnt from word embeddings I Mikolov et al, 2013b) 
illustrated by Eigure 1, the result of word vector calcula¬ 
tion, for instance ^Madrid - 'VSna m + ^France, IS close r 
to wParis than to any other words I Mikolov et al, 2013a) . 
If we study the example mentioned above, the most pos¬ 
sible reason vspain ^Madrid ~ ^France ^Paris^ tS 
that capitaLcity_of is the relation between Madrid and 
Spain , and so is Paris and Prance. In other words, 
^Madrid F capital-city-of ~ '^Spain^ tf the belief is plausi¬ 
ble. Therefore, we define D{h,r,t) as follows to calculate 
the dissimilarity between h -f r and t using Li or L 2 norm, 
and set b as the bias parameter. 


D(/i,r,f) =-||h-I-r - t||-f 5. (5) 


So far, we have already modeled the probability of a be¬ 
lief, i.e. Pr{h,r,t). On the other hand, some imperfect 
repositories, such as NELL, which is automaticall y built by 
machine learning techniques | |Carlson et al, 2010) , assign a 
confidence score ([0.5 — 1.0]) to evaluate the plausibility of 
the corresponding belief Therefore, we define the cost func¬ 
tion C shown in Equation (6), and our objective is to learn a 
better low-dimensional vector representation for each entity 
and relation while continuously minimizing the total loss of 
fitting each belief {h, r, t, c) in the training set A to the corre¬ 
sponding confidence c. 

arg min C= ^(logPr(/i,r, f) — logc)^ 

h r f ^ 

= Yl \{^{^ogPr{h\r,t)+logPr{r\h,t) 

{h,r,t,c) G A 

-f logPr(f|h,r)] - logc)}^. 

4 Algorithm 

To search for the optimal solution of Equation (6), we use 
Stochastic Gradient Descent (SGD) to update the embed¬ 
dings of entities and relations in iterative fashion. However, it 
is cost intensive to directly compute the normalization terms 
in Pr{h\r, t), Pr{r\h, t) and Pr (t\h, r). Enlightened by the 
work of Mikolov et al. | |2013a[ , we have found an efficient 
approach that adopts negative sampling to transform the con¬ 
ditional probability functions, i.e. Equation (2), (3) and (4), to 
the binary classification problem, as shown in the subsequent 
equations, 

log Pr{h\r,t) « logPr(l|/i,r, f) 

k 

+ Y ^K~Pph'&E^) logPr(0|h', r, t), 
\ogPr{r\h,t) ^ \ogPr{l\h,r,t) 

k 

+ Y log Pr{0\h, r'i,t), 

i=l 


( 8 ) 




















logPr{t\h,r) ^ \ogPr{l\h,r,t) 

k 

+ P, Et'-Pr'(t'Gi5t) log Pr{Q\h, r, <'), 


(9) 


in (7), (8), and (9), we sample k negative beliefs and discrim¬ 
inate them from the positive case. For the simple binary clas¬ 
sification problem mentioned above, we choose the logistic 
function with the offset e shown in Equation (10) to estimate 
the probability that the given belief {h, r, t) is correct; 


Pr{l\h,r,t) = 


1 


1 -f 


-f e. 


( 10 ) 


We also display the framework of the learning algorithm of 
IIKE in pseudocode as follows, 

Algorithm 1 The Learning Algorithm of IIKE 
Input: 

Training set A = {(ft,, r, t, c)}, entity set E, relation set 
i?; dimension of embeddings d, number of negative sam¬ 
ples k, learning rate a, convergence threshold rj, maxi¬ 
mum epoches n. 

/*Initialization*/ 
foreach r G i? do 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 


r:=U„iform(^,A) 


|r| 


end foreach 
foreach e G E do 

e:=Uniform(^,^) 

e — R 

end foreach 

/*Training*/ 
i := 0 

while Rel.loss > ij and i < n do 

foreach {h,r,t) G A do 
foreach j G range(fc) do 


5 Experiments 

Embedding the entities and relations into low-dimensional 
vector spaces facilitates several classical knowledge inference 
tasks, such as link prediction and triplet classification. More 
specifically, link prediction performs inference via predicting 
a ranked list of missing entities or relations given the other 
two elements of a triplet. Eor example, it can predict a series 
of t given ft and r, or a bunch of ft given r and t. And triplet 
classification is to discriminate whether a triplet (ft, r, t) is 
correct or wrong. _ 

Several recent research wor ks i Bordes et al, 201^||Socher 
et al., 2013 Wang etal, 2014| reported that they used subsets 
of Ereebase (FB) data to evaluate their models and showed the 
performance on the above two tasks, respectively. In order 
to conduct solid experiments, we compare our model (IIKE) 
with many related studies including the baseline and cutting- 
edge approaches mentioned in Section 2.2. Moreover, we use 
a larger imperfect and incomplete dataset (NELL) to perform 
comparisons involving the same tasks to show the superior 
inference capability of IIKE, and have released this dataset 
for others to use. 

We are also glad to share all the datasets, the source codes 
and the learnt embeddings for entities and relations, which 
can be freely downloaded from http://pan . baidu . 
com/s/ImgxGbgS, 

5.1 Link prediction 

One of the benefits of knowledge embedding is that we can 
apply simple vector calculations to many reasoning tasks, and 
link prediction is a valuable task that contributes to complet¬ 
ing the knowledge graph. With the help of knowledge em¬ 
beddings, if we would like to tell whether the entity ft has 
the relation r with the entity t, we just need to calculate the 
distance between h -f r and t. The closer they are, the more 
possibility the triplet (ft, r, t) exists. 

Datasets 


15 

Negative sampling; (ft', r, /) G A'^ 




16 

/*A'^ is the set of k negative beliefs replacing ft*/ 

DATASET 

FBI5K 

NELL 



#(ENTITIES) 

14,951 

74,037 

17 

Negative sampling; {h,r'pt) G A(, 

#(RELATIONS) 

1,345 

226 

18 

/*A(. is the set of k negative beliefs replacing r*/ 

#(TRAINING EX.) 

483,142 

713,913 



#(VALIDATING EX.) 

50,000 

7,296 

19 

Negative sampling; {h,r,t'^) G A( 

#(TESTING EX.) 

59,071 

7,296 


20: is the set of k negative beliefs replacing f*/ 

21: end foreach 

22: Eh,r,t.h',r'.t' V5 (log r, t) - log c)^ 

23: /*Updating embeddings of (ft, r, t) G A, (ft', r, t) G 

A(j, {h,r',t) G A(,, {h,r,t') G A( with a and the 
batch gradients derived from Equation (7), (8), (9) 
and (10).*/ 

24: end foreach 

25: i++ 

26: end while 
Output: 

All the embeddings of ft,/ and r, where h,t G E and 
r G R. 


Table 1; Statistics of the datasets used for link prediction task. 


Bordes et al._ fBordes et al, 20131 released a large 
dataset (FBISKf] extracted from Ereebase and constructed 
by crowdsourcing, in which each belief is a triplet without a 
confidence score. Therefore, we assign 1.0 to each training 
triplet by default. We have also identified a larger repository 
on the web named nell|^ which is automatically built by 

^Related studies on this dataset can be looked up from 
the website https://www.hds.utc.fr/everest/doku. 
php?id=en:transe 

'The whole dataset of NELL can be downloaded from http : 
//rtw.ml.emu.edu/rtw/resources 

























DATASET 


FB15K 



METRIC 

MEAN 

Raw 

RANK 

Filter 

MEAN 1 
Raw 

TIT@10 

Filter 

Unstructured IBordes et al, 20141 

1,074/14,951 

979/14,951 

4.5% 

6.3% 

RESCAL 1 Nickel ef al, 2011) 

828/ 14,951 

683/14,951 

28.4% 

44.1% 

SE IBordes efa/.,201ll 

273/ 14,951 

162/14,951 

28.8% 

39.8% 

SME (LINEAR) IBordes et al, 2014) 

274/14,951 

154/14,951 

30.7% 

40.8% 

SME (BILINEAR) IBordes et ar7W4) 

284/14,951 

158/14,951 

31.3% 

41.3% 

LIM 1 Jenatton et al, 20121 

283/ 14,951 

164/14,951 

26.0% 

33.1% 

IransE IBordes et al, 20131 

243/ 14,951 

125/14,951 

34.9% 

47.1% 

TransHyWang et al, 20141 

211 /14,951 

84/ 14,951 

42.5% 

58.5% 

IIKE 

183/ 14,951 

70/14,951 

47.1% 

59.7% 


Table 2: Link prediction results on the FB15K dataset. We compared our proposed IIKE with the state-of-the-art method 
TransH and other prior arts mentioned in Section 2.2. 


DATASET 

NELL 

METRIC 

MEAN 

Raw 

’RANK 

Filter 

MEAN I 
Raw 

IIT@10 

Filter 

TransE IBordes et al, 2013'|n 

TransH i Wang et al, 2014] 

4,254 / 74,037 
3,469 / 74,037 

4,218/74,037 
2,218 / 74,037 

11.0% 

25.2% 

12.3% 

41.6% 

IIKE 

2,464 / 74,037 

2,428 / 74,037 

37.3% 

38.2% 


Table 3: Link prediction results on the NELL dataset. We compared our proposed IIKE with the cutting-edge methods TransH 
and TransE. 


machine learning techniques, and each triplet is l abeled with 
a probabi lity estimated by synthetic algorithms I Carlson et 


al., 2010|. We reserve the beliefs with probability ranging 


(0.5 - 1.0], use the ground-truth (1.0) beliefs as the validating 
and testing examples, and train the models with the remains. 

Table 1 shows the statistics of these two datasets. The scale 
of NELL dataset is larger than FB15K with many more enti¬ 
ties but fewer relations, which may lead to the differences of 
tuning parameter^ 


Evaluation Protocol 

For each testing triplet, all the other entities that appear in the 
training set take turns to replace the head entity. Then we get a 
bunch of candidate triplets associated with the testing triplet. 
The dissimilarity of each candidate triplet is firstly computed 
by various scoring functions, such as | |h -b r — t| |, and then 
sorted in ascending order. Finally, we locate the ground-truth 
triplet and record its rank. This whole procedure runs in the 
same way when replacing the tail entity, so that we can gain 
the mean results. We use two metrics, i.e. Mean Rank and 
Mean Hit@10 (the proportion of ground truth triplets that 
rank in Top 10), to measure the performance. However, the 
results measured by those metrics are relatively inaccurate, as 
the procedure above tends to generate false negative triplets. 
In other words, some of the candidate triplets rank rather 
higher than the ground truth triplet just because they also ap¬ 
pear in the training set. We thus hlter out those triplets to 
report more reasonable results. 


®It turns out that embedding models prefer a larger dimension 
of vector representations for the dataset with more entities, and Li 
norm for fewer relations. 


Experimental Results 

We compared IIKE with the state-of-the-art TransH, TransE 
and other models mentioned in Section 2.2 evaluated on 
FB15K and NELL . We tuned the parameters of each pre¬ 
vious mode|^ based on the validation set, and select the 
combination of parameters which leads to the best perfor¬ 
mance. The results of prior arts o n FB15 K are the same 
as those reported by Wang et al. 1 2014) . For IIKE, we 
tried several combinations of parameters: d = {20, 50,100}, 
a = {0.1,0.05,0.01,0.005,0.002}, b = {7.0,10.0,15.0} 
and norm = {Li, L 2 }, and finally chose d = 50, a = 0.002, 
b = 7.0, norm = L 2 for the FB15K dataset, and d = 100, 
a = 0.001, b = 7.0, norm = Li for the NELL dataset. 
Moreover, to make responsible comparisons between IIKE 
and the state-of-the-art approaches, we requested the authors 
of TransH to re-run their system on the NELL dataset and re¬ 
ported the best results. Table 2 demonstrates that IIKE outper¬ 
forms all the prior arts, inc luding the baseline model Unst ruc- 
tured I Bordes et al, 2^ , RESCAL I lNickel et aL,'Wn\, SE 
I Bordes et al, 201 1[, SME (L/ATfAl?) | |Bordes et al, 2014), 
SME (BILINEAR) IBordes etal, 20f4Tl , LFM | Jenatton et a/., 
2012) and TransE [Bordes et al, 201^ , and achieves signifi¬ 
cant improvements on th e EB15K dataset, c ompared with the 
state-of-the-art TransH iWang et al, 1^ . For the NELL 
dataset, IIKE performs stably on the evaluation metrics com¬ 
pared with TransH and TransE, as Table 3 shows that it im¬ 
proves by 28.9% in terms of Raw Mean Rank, and achieves 
comparable performance of Filter Mean Rank compared with 
TransH. 


®A11 the codes for the related models can be downloaded from 
https://github.com/glorotxa/SME 



















































































5.2 Triplet classification 


Triplet classification is anothe r inference related task pro¬ 
posed by Socher et al. 1 2013) which focuses on searching 
a relation-specific threshold ar to identify whether a triplet 
(/i, r, t) is plausible. 


Datasets 


DATASET 

FB15K 

NELL 

#(ENTITIES) 

14,951 

74,037 

#(RELATIONS) 

1,345 

226 

#(TRAINING EX.) 

483,142 

713,913 

#(VALIDATING EX.) 

100,000 

14,592 

#(TESTING EX.) 

118,142 

14,582 


Table 4: Statistics of the datasets used for triplet classification 
task. 


Wang et al. 120141 constructed a standard dataset FB15K 
sampled from Freebase. Moreover, we build another imper¬ 
fect and incomplete dataset, i.e. NELL, following the same 
principle that the head or the tail entity can be randomly re¬ 
placed with another one to produce a negative triplet, but in 
order to build much tough validation and testing datasets, the 
principle emphasizes that the picked entity should once ap¬ 
pear at the same position. For example, (Pablo Picaso, na¬ 
tionality, American) is a potential negative example rather 
than the obvious irrational (Pablo Picaso, nationality. Van 
Gogh), given a positive triplet (Pablo Picaso, nationality, 
Spanish), as American and Spanish are more common as the 
tails of nationality. And the beliefs in the training sets are the 
same as those used in triplet classification. Table 4 shows the 
statistics of the standard datasets that we used for evaluating 
models on the triplet classification task. 


Evaluation Protocol 

The decision strategy for binary classification is simple: if the 
dissimilarity of a testing triplet (h, r, t) computed by frih, t) 
is below the relation-specific threshold Or, it is predicted as 
positive, otherwise negative. The relation-specific threshold 
ar can be searched via maximizing the classification accuracy 
on the validation triplets which belong to the relation r. 


Experimental Results 

We use the best combination of parameter settings in the link 
prediction task: d — bi), a = 0.002, b = 7.0, norm = L 2 
for the FB15K dataset, and d = 100, a = 0.001, b = 7.0, 
norm = Li for the NELL dataset, to generate the entity and 
relation embeddings, and learn the best classification thresh¬ 
old Or for each relation r. Compared with several of the latest 



Figure 2: The comparison of precison-recall curves for triplet 
classification among the proposed IIKE (red lines), the state- 
of-the-art approaches TransH (blue lines) and TransE (green 
lines). 


approaches, i.e . TransEl I Wang et al., 2014) , Trans E |Bor 


des et al, 2013 1 and Neural Tensor Network (NTN, I Socher 


et al, 2013), the proposed IIKE approach still outperforms 


them, as shown in Table 5. We also drew the precision-recall 
curves which indicate the capability of global discrimination 
by ranking the distance of all the testing triplets, and Figure 2 
shows that the AUC (Areas Under the Curve) of IIKE is much 
bigger than the other approaches. 


6 Conclusion 

We challenge the problem of knowledge inference on imper¬ 
fect and incomplete repositories in this paper, and have pro¬ 
duced an elegant probabilistic embedding model to tackle this 
issue at the first attempt by measuring the probability of a 
given belief {h,r,t). To efficiently learn the embeddings for 
each entity and relation, we also adopt the negative sampling 
technique to transform the original model and display the al¬ 
gorithm based on SGD to search the optimal solution. Ex¬ 
tensive experiments on knowledge inference including link 
prediction and triplet classification show that our approach 
achieves significant improvement on two large-scale knowl¬ 
edge bases, compared with state-of-the-art and baseline meth¬ 
ods. 

We are pleased to see further improvements of the pro¬ 
posed model, which leaves open promising directions for the 
future work, such as taking advantage of the knowledge em¬ 
beddings to enhance the studies of text summarization and 
open-domain question answering. 


DATASET 

FB15K 

NELL 

NTN 1 Socher ef at., 2013) 


66.7% 

- 

TransEl 

Bordes et al., 2015 


79.7% 

82.4% 

TransH 

||Wang et al., 2014] 


80.2% 

89.1% 

IIKE 

91.1% 

91.4% 


Table 5: The accuracy of triplet classification compared with 
several latest approaches: TransH, TransE and NTN. 
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